O que é?
Ashish Kumar Sultania. Monitoring and Failure Recovery ofCloud-Managed Digital Signage. University of Tartu, Estônia, 2017. The full backup often makes many redundant data copies, as datasets [... may not] have many changes between the backups. [...] Therefore, to eliminate redundancy, a space saving mechanism called Data deduplication can be used. It replaces multiple copies of data with references to a single compressed copy, thereby reducing the amount of needed space capacity. There are various tools available for data deduplication such as FSlint and fdupes. These tools scan directories to search duplicate files and provide actions such as to show, delete or replace the files with hardlinks. They work by comparing filesizes, MD5 signatures of partial file, MD5 signatures of the entire file, and then a cryptographic hash function of the entire file for verification. The comparisons of both these tools can be checked at [71] resulting that the FSlint is better in detecting duplicates.
References
Como funciona?
In summary the algorithm is:
Como usar?
Manual: FSlint/Using FSlint/Duplicates. FLOSS Manuals website.